50 research outputs found

    Multi-modal joint embedding for fashion product retrieval

    Get PDF
    © 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Finding a product in the fashion world can be a daunting task. Everyday, e-commerce sites are updating with thousands of images and their associated metadata (textual information), deepening the problem, akin to finding a needle in a haystack. In this paper, we leverage both the images and textual meta-data and propose a joint multi-modal embedding that maps both the text and images into a common latent space. Distances in the latent space correspond to similarity between products, allowing us to effectively perform retrieval in this latent space, which is both efficient and accurate. We train this embedding using large-scale real world e-commerce data by both minimizing the similarity between related products and using auxiliary classification networks to that encourage the embedding to have semantic meaning. We compare against existing approaches and show significant improvements in retrieval tasks on a large-scale e-commerce dataset. We also provide an analysis of the different metadata.Peer ReviewedPostprint (author's final draft

    3D pose estimation using convolutional neural networks

    Get PDF
    The present Master Thesis describes a new Pose Estimation method based on Convolutional Neural Networks (CNN). This method divides the three-dimensional space in several regions and, given an input image, returns the region where the camera is located. The first step is to create synthetic images of the object simulating a camera located at di↵erent points around it. The CNN is pre-trained with these thousands of synthetic images of the object model. Then, we compute the pose of the object in hundreds of real images, and apply transfer learning with these labeled real images over the existing CNN, in order to refine the weights of the neurons and improve the network behaviour against real input images. Along with this deep learning approach, other techniques have been used trying to improve the quality of the results, such as the classical sliding window or a more recent class-generic object detector called objectness. It is tested with a 2D-model in order to ease the labeling process of the real images. This document outlines all the steps followed to create and test the method, and finally compares it against a state-of-the-art method at di↵erent scales and levels of blurring

    Multi-modal fashion product retrieval

    Get PDF
    Finding a product in the fashion world can be a daunting task. Everyday, e-commerce sites are updating with thousands of images and their associated metadata (textual information), deepening the problem. In this paper, we leverage both the images and textual metadata and propose a joint multi-modal embedding that maps both the text and images into a common latent space. Distances in the latent space correspond to similarity between products, allowing us to effectively perform retrieval in this latent space. We compare against existing approaches and show significant improvements in retrieval tasks on a largescale e-commerce dataset.Peer ReviewedPostprint (author's final draft

    Multi-modal embedding for main product detection in fashion

    Get PDF
    © 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Best Paper Award a la 2017 IEEE International Conference on Computer Vision WorkshopsWe present an approach to detect the main product in fashion images by exploiting the textual metadata associated with each image. Our approach is based on a Convolutional Neural Network and learns a joint embedding of object proposals and textual metadata to predict the main product in the image. We additionally use several complementary classification and overlap losses in order to improve training stability and performance. Our tests on a large-scale dataset taken from eight e-commerce sites show that our approach outperforms strong baselines and is able to accurately detect the main product in a wide diversity of challenging fashion images.Peer ReviewedAward-winningPostprint (author's final draft

    BASS: boundary-aware superpixel segmentation

    Get PDF
    © 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.We propose a new superpixel algorithm based on exploiting the boundary information of an image, as objects in images can generally be described by their boundaries. Our proposed approach initially estimates the boundaries and uses them to place superpixel seeds in the areas in which they are more dense. Afterwards, we minimize an energy function in order to expand the seeds into full superpixels. In addition to standard terms such as color consistency and compactness, we propose using the geodesic distance which concentrates small superpixels in regions of the image with more information, while letting larger superpixels cover more homogeneous regions. By both improving the initialization using the boundaries and coherency of the superpixels with geodesic distances, we are able to maintain the coherency of the image structure with fewer superpixels than other approaches. We show the resulting algorithm to yield smaller Variation of Information metrics in seven different datasets while maintaining Undersegmentation Error values similar to the state-of-the-art methods.Peer ReviewedPostprint (author's final draft

    Estimación monocular y eficiente de la pose usando modelos 3D complejos

    Get PDF
    Trabajo presentado a las XXXV Jornadas de Automática celebradas en Valencia del 3 al 5 de septiembre de 2014.-- Premio Infaimon a mejor artículo de visión.El siguiente documento presenta un método robusto y eficiente para estimar la pose de una cámara. El método propuesto asume el conocimiento previo de un modelo 3D del entorno, y compara una nueva imagen de entrada únicamente con un conjunto pequeño de imágenes similares seleccionadas previamente por un algoritmo de >Bag of Visual Words>. De esta forma se evita el alto coste computacional de calcular la correspondencia de los puntos 2D de la imagen de entrada contra todos los puntos 3D de un modelo complejo, que en nuestro caso contiene más de 100,000 puntos. La estimación de la pose se lleva a cabo a partir de estas correspondencias 2D-3D utilizando un novedoso algoritmo de PnP que realiza la eliminación de valores atípicos (outliers) sin necesidad de utilizar RANSAC, y que es entre 10 y 100 veces más rápido que los métodos que lo utilizan.Este trabajo ha estado financiado en parte por los proyectos RobTaskCoop DPI2010-17112, ERA-Net Chistera ViSen PCIN-2013-047, y por el proyecto EU ARCAS FP7-ICT-2011-287617.Peer Reviewe

    Efficient monocular pose estimation for complex 3D models

    Get PDF
    Trabajo presentado al ICRA celebrado en Seattle (US) del 26 al 30 de mayo de 2015.We propose a robust and efficient method to estimate the pose of a camera with respect to complex 3D textured models of the environment that can potentially contain more than 100, 000 points. To tackle this problem we follow a top down approach where we combine high-level deep network classifiers with low level geometric approaches to come up with a solution that is fast, robust and accurate. Given an input image, we initially use a pre-trained deep network to compute a rough estimation of the camera pose. This initial estimate constrains the number of 3D model points that can be seen from the camera viewpoint. We then establish 3D-to-2D correspondences between these potentially visible points of the model and the 2D detected image features. Accurate pose estimation is finally obtained from the 2D-to-3D correspondences using a novel PnP algorithm that rejects outliers without the need to use a RANSAC strategy, and which is between 10 and 100 times faster than other methods that use it. Two real experimentsdealing with very large and complex 3D models demonstrate the effectiveness of the approach.This work has been partially funded by the Spanish Ministry of Economy and Competitiveness under projects ERANet Chistera project ViSen PCIN-2013-047, PAU+ DPI2011-27510 and ROBOT-INT-COOP DPI2013-42458-P, and by the EU project ARCAS FP7-ICT-2011-28761.Peer Reviewe

    Estimación monocular y eficiente de la pose usando modelos 3D complejos

    Get PDF
    El siguiente documento presenta un método robusto y eficiente para estimar la pose de una cámara. El método propuesto asume el conocimiento previo de un modelo 3D del entorno, y compara una nueva imagen de entrada únicamente con un conjunto pequeño de imágenes similares seleccionadas previamente por un algoritmo dePeer ReviewedPostprint (author’s final draft

    Evolving trends in the management of acute appendicitis during COVID-19 waves. The ACIE appy II study

    Get PDF
    Background: In 2020, ACIE Appy study showed that COVID-19 pandemic heavily affected the management of patients with acute appendicitis (AA) worldwide, with an increased rate of non-operative management (NOM) strategies and a trend toward open surgery due to concern of virus transmission by laparoscopy and controversial recommendations on this issue. The aim of this study was to survey again the same group of surgeons to assess if any difference in management attitudes of AA had occurred in the later stages of the outbreak. Methods: From August 15 to September 30, 2021, an online questionnaire was sent to all 709 participants of the ACIE Appy study. The questionnaire included questions on personal protective equipment (PPE), local policies and screening for SARS-CoV-2 infection, NOM, surgical approach and disease presentations in 2021. The results were compared with the results from the previous study. Results: A total of 476 answers were collected (response rate 67.1%). Screening policies were significatively improved with most patients screened regardless of symptoms (89.5% vs. 37.4%) with PCR and antigenic test as the preferred test (74.1% vs. 26.3%). More patients tested positive before surgery and commercial systems were the preferred ones to filter smoke plumes during laparoscopy. Laparoscopic appendicectomy was the first option in the treatment of AA, with a declined use of NOM. Conclusion: Management of AA has improved in the last waves of pandemic. Increased evidence regarding SARS-COV-2 infection along with a timely healthcare systems response has been translated into tailored attitudes and a better care for patients with AA worldwide

    Mortality and pulmonary complications in patients undergoing surgery with perioperative SARS-CoV-2 infection: an international cohort study

    Get PDF
    Background: The impact of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) on postoperative recovery needs to be understood to inform clinical decision making during and after the COVID-19 pandemic. This study reports 30-day mortality and pulmonary complication rates in patients with perioperative SARS-CoV-2 infection. Methods: This international, multicentre, cohort study at 235 hospitals in 24 countries included all patients undergoing surgery who had SARS-CoV-2 infection confirmed within 7 days before or 30 days after surgery. The primary outcome measure was 30-day postoperative mortality and was assessed in all enrolled patients. The main secondary outcome measure was pulmonary complications, defined as pneumonia, acute respiratory distress syndrome, or unexpected postoperative ventilation. Findings: This analysis includes 1128 patients who had surgery between Jan 1 and March 31, 2020, of whom 835 (74·0%) had emergency surgery and 280 (24·8%) had elective surgery. SARS-CoV-2 infection was confirmed preoperatively in 294 (26·1%) patients. 30-day mortality was 23·8% (268 of 1128). Pulmonary complications occurred in 577 (51·2%) of 1128 patients; 30-day mortality in these patients was 38·0% (219 of 577), accounting for 81·7% (219 of 268) of all deaths. In adjusted analyses, 30-day mortality was associated with male sex (odds ratio 1·75 [95% CI 1·28–2·40], p\textless0·0001), age 70 years or older versus younger than 70 years (2·30 [1·65–3·22], p\textless0·0001), American Society of Anesthesiologists grades 3–5 versus grades 1–2 (2·35 [1·57–3·53], p\textless0·0001), malignant versus benign or obstetric diagnosis (1·55 [1·01–2·39], p=0·046), emergency versus elective surgery (1·67 [1·06–2·63], p=0·026), and major versus minor surgery (1·52 [1·01–2·31], p=0·047). Interpretation: Postoperative pulmonary complications occur in half of patients with perioperative SARS-CoV-2 infection and are associated with high mortality. Thresholds for surgery during the COVID-19 pandemic should be higher than during normal practice, particularly in men aged 70 years and older. Consideration should be given for postponing non-urgent procedures and promoting non-operative treatment to delay or avoid the need for surgery. Funding: National Institute for Health Research (NIHR), Association of Coloproctology of Great Britain and Ireland, Bowel and Cancer Research, Bowel Disease Research Foundation, Association of Upper Gastrointestinal Surgeons, British Association of Surgical Oncology, British Gynaecological Cancer Society, European Society of Coloproctology, NIHR Academy, Sarcoma UK, Vascular Society for Great Britain and Ireland, and Yorkshire Cancer Research
    corecore